Third Eye: API for Internet Archive TV News chyrons
How we turn TV News chyrons into data
TV cable news channels display chryons on the "lower thirds" of screens, to display breaking news and other highlights. Using the Internet Archive TV News, TV architect Tracey Jaquith built the Third Eye to scan the lower parts of the screen and apply OCR, or optical character recognition, to turn the words into text.
Third Eye captures four TV cable news channels:
BBC News, CNN, Fox News, and MSNBC.
The project launched with four million chyrons captured in just over two weeks.
TV chyron "lower third" and OCR example:| V | V AFTER WH MEETING, SCHUMER DISHES WHEN HE THOUGHT NIC WAS OFF
Filtered chyrons turned into tweets
Because chyrons are created in near real-time, they can sometimes include misspellings; in addition, the OCR process can return some messy text. Jaquith has adapted algorithms to find the most representative and clearest tweets for every 60-second period. This cleaned up feed fuels the Twitter bots that post, in near-real time, which chyrons are appearing on TV news screens.
Image lookups and Video links
Image lookupYou can copy a row from our API and do an image lookup to see what we OCR-ed. We have approximately the prior six months available. Paste and press the [show image] button below.
Video linksOnce 12 hours has passed since the end of a program, you can use the [show video] button below to see the clip around the OCR-ed region.
- Chryons are derived in near real-time from the Internet Archive TV News's collection of TV news. The constantly updating public collection contains 1.4 million TV news shows, some dating back to 2009.
- Result times have some approximation calculations buit in due to storage / retrieval efficiency concerns. Asking for "most recent 3 hours" will typically have some extra data in results.
- You can alternatively find/use the daily raw .tsv files directly (from this item ).
- Data can be affected by temporary collection outages, which typically can last minutes or hours, but rarely more. If you are concerned about a specific time gap in a feed and would like to know if it's the result of an outage, please inquire at firstname.lastname@example.org.
- The "raw feed" option provides all of the OCR'ed text from chryons at the rate of approximately one entry per second. The "filtered tweets" download provides the data feed that fuels our Twitter bots; this has been filtered to find the most representative, clearest chyrons from a 60-second period. The filtered feed relies on algorithms that are a work in progress; we invite you to share your ideas on how to effectively filter the noise from the raw data.
- Dates/times are in UTC (Coordinated Universal Time) in API feeds, (PST (Pacific Standard Time) in tweets.
- Because the size of the raw data is so large (about 20 megabytes per day), we limit results to seven days per request.
- We began collecting raw data on August 25, 2017; the filtered feed begins on September 7, 2017.
- To open a TSV file in a program such as Google sheets or Excel, you'll need to download the text file, check the box next to "Save results to file" to save as a file on your computer. Then you can use the import function to pull the text file into your program, using "tab" as your delimiter.
- "Duration" column is in seconds –- the amount of time that particular chyron appeared on the screen.
To view clips in context on the
Internet Archive TV News,
before the field that begins with a channel name.